Instruction Precomputation: Dynamically Removing Redundant Computations Using Profiling

ثبت نشده
چکیده

As a program executes, some computations are performed over and over again. These redundant computations increase the program’s execution time since they require multiple cycles to execute and because they consume limited processor resources. To minimize the performance degradation that redundant computations have on the processor, we propose using Instruction Precomputation hardware to dynamically remove these redundant computations. Instruction Precomputation profiles the program to determine the highest frequency redundant computations, which are loaded into the Precomputation Table before the program executes. During program execution, the processor accesses the Precomputation Table to determine whether or not an instruction is a redundant computation; instructions that are redundant computations receive their output value from the Precomputation Table and are removed from the pipeline. The key difference between Instruction Precomputation and Value Reuse another microarchitectural technique that dynamically removes redundant computations is that Instruction Precomputation does not dynamically update the Precomputation Table with the most recent redundant computations since it already contains the highest frequency ones. Consequently, Instruction Precomputation requires less chip area and has less impact on the clock period, as compared to Value Reuse. For a 2048entry Precomputation Table, dynamically removing redundant computations yields an average speedup of 10.53%, while, by comparison, a 2048-entry Value Reuse Table produces an average speedup of 7.43%. 0-8493-0052-5/00/$0.00+$.50 c © 2001 by CRC Press LLC 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Increasing Instruction-Level Parallelism with Instruction Precomputation

Value reuse improves a processor’s performance by dynamically caching the results of previous instructions and reusing those results to bypass the execution of future instructions that have the same opcode and input operands. However, continually replacing the least recently used entries could eventually fill the value reuse table with instructions that are not frequently executed. Furthermore,...

متن کامل

High Performance Microprocessor Design Methods Exploiting Information Locality and Data Redundancy for Lower Area Cost and Power Consumption

Value predictor predicting result of instruction before real execution to exceed the data flow limit, redundant operation table removing redundant computation dynamically, and asynchronous bus avoiding clock synchronization problem have been proposed as high performance microprocessor design methods. However, these methods increase area cost and power consumption problems because of the larger ...

متن کامل

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

This paper analyzes the amount of global level redundant computation within selected benchmarks of the SPEC 95 and SPEC 2000 benchmark suites. Local level redundant computations are redundant computations that are the result of a single static instruction (i.e. PC dependent) while global level redundant computations are redundant computations that are the result of multiple static instructions ...

متن کامل

Static Removal of Redundant Loads

This paper presents a strong technique for removing redundant loads from programs. The first part of the paper describes an algorithm for detecting memory redundancies (based on Simpson’s SCC value numbering algorithm [34]) and two frameworks for subsequently removing the redundant operations from the code (one based on available expressions [14] and the other on lazy code motion [23, 17]). The...

متن کامل

Rotamer-Pair Energy Calculations Using a Trie Data Structure

Protein design software places amino acid side chains by precomputing rotamer-pair energies and optimizing rotamer placement. If the software optimizes by rapid stochastic techniques, then the precomputation phase dominates run time. We present a new algorithm for rapid rotamer-pair energy computation that uses a trie data structure. The trie structure avoids redundant energy computations, and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004